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EVALUATION OF EDUCATIONAL RESEARCH IN THE NETHERLANDS 



Summary 

To get an impression of the quality of educational research in the Netherlands 
55 paper proposals accepted by the Paper Committee of the Educational Research 
Day 1974 were evaluated. Each of the 204 judges evaluated 2 randomly assigned 
proposals on 27 characteristics. These characteristics were an extension of 
the instrument used by a committee of the AERA in a similar study (Wandt, 1968) 
The proposals showed a number of specific shortcomings. The general impression 
was weak. By factor analysis the factorial validity of the instrument was deter 
mined. Multiple regression analysis showed the instrument could reasonably pre- 
dict the general impression of research. 
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EVALUATION OF EDUCATIONAL RESEARCH IN THE NETHERLANDS 

Max van der Kamp - Kohnstamm Institute of Educational Research, 

University jof Amsterdam 

Leo J,Th. van der Kamp - Department of Psychology, University of Leiden 

Educational research in the Netherlands has been proliferated rapidly in re- 
cent years. Oni- indication for this proliferation may give the number of part- 
icipants in the Annual Meetings of the Dutch Educational Research Association. 
The first annual meeting was held in 1974 vith a total of about 550 particip- 
ants. In view of the total number of inhabitants of our small country, this 
amounts to approximately 1 educational researcher to 22000 inhabitants . It 
should be noticed, however, that the term "educational researcher" is used in 
this context in a rather loosely defined way. Only recently facilities have 
been created for a formal training in educational research at our universities. 
So most of the above mentioned educational researchers can not be considered 
as a species bred in this new discipline. Another indication for the proliferat- 
ion of educational research in the Netherlands may be obtained from an examinat- 
ion of the development of the educational research as institutionalized activit- 
ies. To give you a bird's eye view of the history of post-war educational re- 
search, its development will be divided into periods of ten years each. 

Period 1 : 1950-1960, the early beginnings 

In this period educational practice showed its first serious attention for ed- 
ucational research activities as far as the latter products can be used for ed- 
ucational proctice. There is also a burgeoning of validation studies with resp- 
ect to educational instruments. In the second half of the 1950s research and 
development activities of more than incidental importance have been given thought 
to more thoroughly than before. Reluctantly funds were made available for educ- 
ational research. 

Period 2 : 1960-1970, Dutch educational research growing towards maturity 
Particularly in the second half of the 1960s educational research is booming. 
This sudden increase in educational research activities was coincided with the 
tremendous increase in the number of enrolled students in our universities. A 
signal event in this period was the establishment of the Foundation for Educ- 
ational Research (SVO). This foundation has proven to be an important factor to 
v;hich the growth of educational research may be ascribed, and it still serves 
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as a major coordinating force in this research area* In this period, following 
the developments in the United States, an important study was undertaken on ed- 
ucational opportunity and social equality, the Dutch Project Talent, whereas 
the first compensatory programs were tried out. At several institutes for tert- 
iary education research teams or special departments were established in order 
to do research in higher education. 

Period 3: 1970- years of stabilization? 

Dutch educational research was still in a stage of development in the early 
1970s. University graduates, actually the Dutch doctorandi , who had majored in 
psychology, pedagogy, or sociology and who had just finished their studies took 
up educational research. These research activities mainly took place at univers- 
ities, university educational research centres, educational research institutes, 
or at the newly founded Central Institute for Test Development (CITO), which was 
modelled after Educational Testing Service. One might say that educational re- 
search in the Netherlands has become one of the flourishing "real establishments" 
nowadays. How educational research shall develop in the years to come, is diff- 
icult to foresee. Among others, educational research as well as other fields of 
research will depend upon the country's economic situation, or rather the econ- 
omic situation of the developed countries. Education and educational research 
activities, however, have a low degree of autonomy. It is not a field of polit- 
ical decision-making with its "own" goals and its "own" instruments of policy. 
To a great extend it depends on the general frame of reference, on the societal 
goals. Actually 5, few societal goals are free of educational influence but educ- 
ation is never the only policy instrument for the achievement of such goals* 
So how educational research in the Netherlands will develop during the second 
half of this decade also depends upon its impact on educational practice as 
well as on education at large. 

Let us now turn to Dutch educational research and the quality of published ed- 
ucational research in particular. It is a truism to say that an increase of ed- 
ucational information will be of little value for educational practice, unless 
the quality of research is agreed upon. Of course, the requisite quality and 
needed form of educational information will vary according to the technical 
needs and expertise of the target audience (Vockell & Asher, 197U). Our study 
should be seen as a first attempt to assess the quality of educational re- 
search in the Netherlands. Hopefully, it won't give you a too gloomy picture 
of the present state of Dutch educational research. 

5 
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OBJECTIVES 

The objectives of the investigation were: 

(a) to develop an instrument for evaluation of educational research, 

(b) to get a general impression of the overall quality of educational re- 
search in the Netherlands , and 

(c) to identify the specific shortcomings of paper proposals for the 1974 
Annual Meeting of the Dutch Educational Research Association as accepted 
by the Paper Committee. 

METHOD 

Selection of Material and Judges 

Our study was closely related with the 1974 Annual Meeting, where 55 Dutch 
investigators presented their research findings. The paper proposals of the 
investigators were used as the material to be evaluated. A random sample of 
360 judges was drawn from the total group of participants (mainly educational 
researchers) at the 1974 Annual Meeting- The judges were mainly psychologists 
(42%), educationalists (20,6%), sociologists (10,8%) and mathematicans . 
Each of the judges had to evaluate two randomly assigned proposals. The eval- 
uation instrument was an extension of that devised by the AERA Committee on 
Evaluation of Research (Wandt, 1968). 

Firstly, each judge was asked to rate the proposals in terms of 27 specific 
research characteristics. For each characteristic a five-point scale was used, 
representing five levels of quality: (5) excellent, (4) good, (3) mediocre, 
(2) poor and (l) complete incompetent. 

If the characteristic did not apply to the proposal, the judge was instruct- 
ed to rate it as 'does not apply'. 

Secondly, the judge was asked to rate his general impression of the overall 
quality of the research reviewed. In additiopir^^ the judges were asked whether 
there was any overlap with other research in the field, to rate his own expert 
ness to evaluate the proposal assigned to him and to rate his acquintance 
with other publications of the author of the particular paper proposal. The 
response percentage on the questionaire was 57%. The total number of proposals 
evaluated and available for analysis was 389. 

Data analysis 

(1) Frequencies, percentages, means and standard deviations were calculated 
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to discribe the judges' ratings and to evaluate the quality of the propos- 
als • 

(2) Correlation coefficients were computed for the ratings assigned by the 
judges to the 27 characteristics, the general impression of the quality 
of the research reviewed, the subjective expertness of the judges and 
the reputation of the authors* 

(3) Multiple regression analysis was used to examine the interrelations 
between the 27 characteristics and the ger.eral impression of the research 
reviewed, 

" The dimensionality of the instrument was identified by principle compon- 
ents analysis followed by VARIMAX rotation. 

(5) Differences between the 11 divisions on the 1974 Annual Meeting were 
examined by multiple discriminant analysis. 

Limitations of the Study 

The source of the data, the 1974 Annual Meeting of the Dutch Educational Re- 
search Association, had some implications for the investigation. An import- 
ant limitation was the selection of judges and the specific research contrib- 
utions reviewed, i.e. the specific paper proposals submitted to the 1974 
Annual Meeting and accepted by the Paper Committee. These proposals publish- 
ed in the Proceedings of the Annual Meeting were limited in size. This may 
well be the reason why there was a fair amount of missing data. 
The results must be interpreted in the light of the above marks. 

RESULTS 

The Quality of the Proposals 

Means, standard deviations and the total number of observations are present- 
ed for each division seperately and for all papers totally (Table 1). The 
papers were classified into the following divisions: 

(A) Analysis of objectives (2) 

(B) Assessment procedures (2) 

(C) Methodology (4) 

(D) Learning and teaching systems (5) 

(E) Test development (3) 

(F) Evaluation (8) 

(G) Innovation (8) 

7 

(H) Cognitive functions (8) • 

(J) Survey research: student characteristics (6) 
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Short comings of the paper proposals 

From the last column of Table 1 it may be concluded that the judges have been 
rather lenient with respect to their judgen;ent of the paper proposals. In gen- 
eral the formulation of the problem was clear enough, while the way of report- 
ing the investigation was unbiased as well. Furthermore the total stock of 
studies was judged to be more pertinent to educational practice than to theory 
construction. On 16 out of the 28 characteristics used, the total number of 
papers were rated as insufficient. Among others, the limitations of the 
study were not clearly stated, neither was the relationship with previous re- 
search made clear nor the assumptions. A clear definition of the most import- 
ant terms ufied by the investigators was lacking. Other shortcomings had to do 
vjith saniplirig techniques, the validity of the information gathered, the present 
ation of the analysis, the way the conclusions were drawn from the material 
gathered and the generalizibility of the research findings. The general impress 
ion of the total stock of papers presented at the 1974 Annual Meeting of the 
Dutch Educational Research Association, was rather weak. 

A Comparison with the Results of the American Study 

In order to facilitate a crude comparison the ratings of the Dutch research 
proposals were compared with the ratings published by Wandt (1968). The latter 
ratings were based on a similar American study where 81 articles on education- 
al research were judged with respect to their quality. 

In Table 2 the mean ratings of the American articles on educational research 
are juxtaposed to those of the Dutch paper proposals. From Table 2 it can be 
seen that, apart from some differences, the similarities in the findings are 
remarkable. The differences have to do with the formulation of the hypotheses, 
the description of the population studied and with the sampling procedure. 
Although the specific nature and subject matter of the American articles on 
educational research are unknown to us , it might well be that much hypothesis- 
testing research is included, and consequently more attention is paid to the 
formulation of the hypotheses. Another difference is the way of reporting and 
presenting the findings and the conclusions. Neither is this difference very 
remarkable, bearing in mind that the American study used journal articles, 
v;hile paper proposals were used in the Dutch study. It is a well known fact 
that that manuscripts for publication are heavily screened; paper proposals, 
on the other hand, are generally more loosely formulatr^d as they pertain to 

u : 



^^^Q ^ Mean ratings of 81 American articles on educational research and of 

55 paper proposals for the 1974 Annual Meeting of the Dutch Education- 
al Research Association. 



characteristics 


mean rating 






ft 1 Atti ot> T^^an avi+T/^Toe 

on Educational 
ixesearcn 


55 paper proposal 
1974 Annual Meetin 
DERA 




1. Problem is clear Iv <^1"a1"pH 


3,41 


I 3,43 




2. Hypotheses are clearly stated 


3,04 


1 2,78 




3. Contribution to theory 


^ 


2,85 




4. Contribution to educational 
practice 


\ 3,31 

I 


1 3,32 




5- Contribution "tn Qnnncii-aT n cquqc 


J 


3,16 




6. Assumptions are clearly stated 


2 ,40 






7. Limitations of the study are 
stated 


2,m 


2,33 




8. Important terms are defined 


2,84 


2,54 




9- RelationshiD of the nr»oblem 
to previous research is, made 
clear 


2,60 


2,29 




IG* Research design is described 
fully 


3,03 j 


3,07 1 




11. Research design is appropriate 
to the solution of the problem 


2,65 


3.07 I 




12. Research design is free of 
specific weakness 


2,42 


2,95 1 




13 Population and sample are 
described 


3,18 1 


2,70 j 




14* Method of sampling is appropriate 


2,85 


2,32 I 




15. Data-gathering methods or proced- 
ures are appropriate to the 
solution of the problem 


2,99 


2,95 1 
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Kta-^^^^^''^'^^ methods 

t^bocetJ^^^^ utilized 

* Heiiat^il^"^ Of the pr'Oce^^peg 
Used 



18. 
19. 

20- M 



^^lidi'^y °^ 1^he evidence 

^&prop^^^^® ^nethods are selected 
an^-y"^ data 

^e-thocJS ""^^^ized i^^ analyzing 
■^he d^'^^ ^ Applied cotj^g^tiy 

Results f analysis ^^e 
I^fesen-te^ ""le^rly 

^Qnciu^i""^ ^Pe clearly stated 

^oticiusi"?^ substantiated 
the evidence presentee 

J^Hera^^^^^J^ns are confi^g^ 
^ the Pf "^^tion ^ro"* ^hich 

'«J>ort ^■'■^arly v^rittet^ 
Hort ^°eicaliy organized 

i^^iie o:f ^^f t'eport displays an 
^i^ias^d. ^'"liatTrial scientific 
^■<:i:itu^« 



21. 

22. 
23. 



25. 





3.01 


3,06 


1 




2,81 




2 ,49 




! 




2,70 


j 








2,83 


3,15 




3,11 


3,08 




3,11 


2,68 




3,06 


2 ,89 




2,53 


2,71 




3 j07 


2,58 




3,21 


2,79 




3,46 


3,24 




2,42 


3,42 
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research studies the conclusions of which are not final- 
The American and the Dutch experts who evaluated the papers do agree with 
respect to the formulation of the problem he significance and the objectiv- 
ity of the research report. In the Dutch study as well as in the American one 
the formulation of the limitations of the investigation, the definition of the 
most important terms, and the relations with earlier research turned out to be 
weak. 

Finally^ the findings in our study are similar to those of the Wandt study in 
the sense that the way conclusions were drawn from the material gathered, was 
judged to be unacceptable - 

Authors ' reputation 

In our study the judges were asked to rate on a five-point scale whether they 
were familiar with other publications of the authors of the paper proposals. 
The mean of these ratings was 1.48 (with extreme upper-value 2.56). This find- 
ing might imply at least that either the authors of the paper proposals for 
the Annual Meeting have a low publication rate^ or that the judges are not 
familiar enough with and neither keep themselves informed of what is going 
on in the field of educational research in the Netherlands. We are of the opin- 
ion that the latter explanation seems more plausible than the former, the more 
so in view of the tentative conclusions of the Examining Committee for Social 
Research. The latter committee came to the conclusion that much social research 
is badly documented. 

Subjectivly Perceived Expertness 

The paper proposals were randomly assigned to the judges. The question however, 
is whether the judges consider themselves to be sufficiently qualified to ass- 
ess the paper proposals' quality. Therefore the judges were asked to rate 
their own expertness as to to the evaluation of the assigned paper proposals. 
Agcdn ratings were given on a five-point scale. The mean of the subjectively 
perceived expertness was 2.72 with a standard deviation s=0.87. So the judges 
in general perceived themselves as not being qualified enough to evaluate the 
paper proposals. One may ask whether educational research has developed into 
such diverse areas, that no one dares to call himself an expert in whatever 
specific field. Possibly a number of the non-respondents did not cooperate 
for this very reason. To investigate whether the same holds if the overall 
ratings are split up into the seperate divisions of educational research the 
mean of the ratings are calculated for each of the eleven divisions. (Table 3). 
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Table 3. The subjectively perceived expertness of the judges for each division 



division 


mean 


standard dev. 


Sur'vev r'esear'chi: 1:6301161? chapac'ter'is'tics 


3.00 


0.67 


'^nalvsis of obn ectives 


2.86 


1.10 


I Tino V at i on 


2.85 


0.72 


Evaluation 


2.82 


0.83 


[jeaifning and teaching systems 


0 QO 
^ . 


u . oo 


Assessment procedures 


2.75 


1.06 


Cognitive functions 


2.63 


0.82 


miscellaneous 


2.61 


1.02 


Survey research: student characteristics 


2.58 


0.92 


Test development 


2.50 


1.01 


'Methodology 


2.U3 


0.84- 



Overlap with other Research 

To get an idea whether there was much overlap in the research, the judges were 
asked whether overlap did actually occur. 24.6% of the judges affirmed that 
there is overlap in research. Not all overlap in research, however, is ineff- 
icient -replications of investigations might be very useful- 24.6% overlap is 
perhaps an alarming percentage. This overall overlap in research can also be 
split up into the above mentioned 11 divisions. The overlap percentages for 
the separate divisions appear in Table 4. 

Agreement between Judges 

Agreement between judges with respect to a single paper proposal differed cons- 
iderably. The mean intercorrelation varied between r=0.05 and r=0.54. Total 
interrater agreement turned out to be r=0.26, which is a poor overall inter- 
judge correlation. Such a low interrater agreement should not surprise us. 
Goldberg (1968) e.g., in research on clinical judgement, found a median cor- 
relation of r=0.38 between experts who had to judge the severity of ulcers. 
Expertness was no guarantee for consensus according to Goldberg, tq study 
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Table 4. Percentage of judges indicating overlap ir. research. 



divisions 


percent 


Learning and teaching systems 


mo 


Assessment p^^ocedures 


31.5 


Survey research: student characteristics 


30.8 


Evaluation 


29.5 


Innovation 


27.6 


Analysis of objectives 


25. H 


Test development 


22.5 


Miscellaneous 


21.6 


Survey research: teacr'. 'haracteristics 


18.1 


Cognitive functions 


m.3 


Methodology 


11.3 



whether the subjectively perceived expertness is independent of general agree- 
ment between the judges, the rank correlation was calculated between subjectiv- 
ely perceived expertness on the one hand, and the nean interrater agreement for 
a single paper proposal on the other. This yielded a correlation coefficient 
r^=0.09. Neither in our study is expertness of judges (at least subjectively 
perceived expertness) related with consensus. Apparently the norms used to 
evaluate research stin differ a great deal among "experts". 

Discriminant Analysis between Divisions 

Statistical testing of the means and standard deviations of the divisions on 
each of the characteristics separately did not yield any significant difference. 
The difference between the 11 divisions can also be computed using the analysis 
of discriminance. Discriminant analysis yielded ^ discriminant functions which 
accounted for 73.6% of the total variance. Only the first and the second disc- 
riminant functions were statistically significant, together exolaining 50,3% of 
the total variance. The latter two discriminant functions gave two new so call- 
ed discriminant variables which may tentatively be interpreted as "scope versus 

16 
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precision" and "theoretical orientation versus practical orientation". Figure 1 
shows both discriminant functions with a plot of the 11 divisions in this two- 
space. Clearly the 11 divisions differ considerably among each other. 



Figure 1 Insert about here 



Dimensionality of the Instrument 

To study whether the instrument used to evaluate the paper proposals tapped 
different fundamental aspects, the matrix of correlations between the 28 char- 
acteristics was factor analyzed (principal component analysis). In table 5 the 
factor loadings of the 28 characteristics (only factor loadings above O.UO are 
reported) on 5 factors rotated according the VARIMAX criterion are given. These 
5 factors accounting for 52% of the total variance, may be interpreted as follows. 
I Results/conclusions/reporting (variance explained 18%) 

II Research methodology (variance explained 17%) 
III Formulation of the problem in a wider scoDe (variance explained 11%) 

IV Significance for education (variance explained 7%) 
V Description of the population/sampling (variance explained 7%) 
These findirgs are in agreement with the results obtained from a hierarchical 
cluster analysis of the same data. 



Insert tabel 5 about here 



The validity of the Instrument 

Multiple-regression analysis was used to study whether the 'general impression 
of the research' (characteristic 28) could be predicted from the other 27 char- 
acteristics. This multiple correlation turned out to be R=0.83 5 and the corres- 
ponding variance explained 69%. 

And a tentative interpretation of the multiple-regression analysis might be that 
highly qualified educational research is, according to the judges in our study, 
relevant research of practical significance, methodologically well-designed and 
clearly presented on the part of the researcher. How far the 27 characteristics 
can be used actually to get an general impression of a research report may not 
be answered as yet. A cross validation needs to be undertaken to answer this 
question. 17 
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Table 5. Factor loadings of rotated factor solutions. Only loadings above 0.40 
are included (decimal points are omitted) 



10. 
11. 

12. 

13. 
14. 
15. 

16. 

17. 
18. 
19. 

20. 

21. 

22. 
23. 

24. 



8. Important terms are defined I 54. 

i 

I 

9. Relationship of the problem to previous [ 63 
research is made clear i 
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Research design is described fully 

Research design is appropriate to the 

solution of the problem 

Research design is free of specific 

weakness 

Population and sample are described 

Method of sampling is appropriate 

Data-gathering methods or procedures 
are approporiate to the solution of 
the problem. 

Data-gathering methods or procedures 
are utilized correctly 

Reliability of the procedures used 

Validity of the evidence gathered 

Appropriate methods are selected to 
analyze the data 

Methods utilized in analyzing the data 
are applied correctly 

Results of the analysis are presented 
clearly 

Conclusions are clearly stated 

Conclusions are substantiated by the 
evidence presented. 

Generalizations are confined to the 
population front which the sample was 
drawn 



44 
57 



70 

74 
76 
68 
80 

78 



41 



50 



Characteristics 


I 


Factors 
II III 


IV 


V 


1. 


Problem is clearly stated 


53 






53"": 


2. 


Hypotheses are clearly stated 


66 






43 


3. 


Contribution to theory 


45 


54 






4. 


Contribution to educational practice 




84 






5. 


Contribution to societal issues 




78 






6- 


Assumptions are clearly stated 


75 








7. 


Limitations of the study are stated 






41 


40 



58 



41 



79 
82 



69 
73 

67 
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Table 5, continued 


Factors 
I I III IV V 




\ 

25. Report is clearly written 

26. Report is logically organized 

27. Tone of report displays an unbiased, 
impartial, scientific attitude 


71 
75 

K^oj Vo.y; Koo ) KlQ) 59 


Variance explained 


11% 17% 9% 7^ 18% 
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DISCUSSION 

The Perception of Educational Research in the Netherlands 

We have outlined the short cotnings of our study earlier. The material judged 
consisted of a cross section of educational research presented at a certain 
Time, while the group of judges was choosen in a haphazard way. Therefore it 
does not seem justified to generalize our findings in order to make statements 
with respect to educational research in the Netherlands at large. 
Nevertneless the results of our study are both distressing and alarming. The 
more so as the Dutch educational system is changing. Educational research 
night contribute to a more rational change of the present educational system. 
Policy decisions with respect to the change of the educational system should 
more rely on sound and methodologically well designed research that it has 
been until now. 

The findings of our study suggest the following recommendations: 

(a) the consumers of educational research information should adopt a more 
critical attitude towards the quality of educational research; 

(b) educational research reports should be thoroughly evaluated, and 

(c) one should take care for a too rapid and careless dissemination of educat- 
ional information. 

These recommendations are based on the tacit assumption that educational decis- 
ion makers as well as teachers will use the information obtained in educational 
research in formulating their policy guidelines as well as in their teaching 
activities. 

The Evaluation Instrument 

The instrument used seems to be useful for ^ critical evaluation of education- 
al research papers. As can been seen from the results of the factor-analytic 
procedure, groups of items of the evaluation form may be distinguished with 
each group tapping a specific aspect of research papers. These are the follow- 
ing specific aspects of the paper proposals* quality, 
I. Results, conclusions and clarity of reporting 

II. Adequate design 
III. Clarity of the problem, and of the hypotheses 

IV. Significance for education 

V. Description* of the population and representativeness of the sample. 

21 
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The items of the form also discriminate between the 11 divisions, hence some 
discriminant validity might be claimed for this evaluation form for research 
papers. And it turned out also that the perceived overall quality of a paper 
proposal can be predicted rather accurately from the individual items. 
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